Source-map style position annotation #133

chrysn · 2023-02-15T12:51:55Z

For interactive editing (highlighting cursor positions in a two-paned hex and diagnostic view), or for debugging (implementing pd-body-error-position), it would be cool to match ranges of bytes encoded in CBOR to ranges of bytes encoded in diagnostic notation -- similar to how a compiler outputs debug information matching instructions to source lines.

This tangentially related to #20, as it would pave the way to color-highlighting hex output.

One thing that'll make this relatively hard for this crate is that it's interconverting via a mutable AST (which on its own is great, just needs some more effort here). A relatively easy API would be to turn a CBOR byte string into a DN text string (or vice versa), and also produce a source map as a list of corresponding (frequently nested) ranges. There's probably a design pattern by which the AST can keep cursors in two serializations, but I don't know how to make a pretty API out of it, or how to do it with neither pinning nor Rc'ing nor indices for which it isn't completely clear which slice they relate to.

Nemo157 · 2023-02-15T14:39:26Z

It would be relatively easy to add spans on the AST pointing back to the parsed input, I had planned to port the parsing to use chumsky at some point which makes it trivial. These spans can be generic and either a Range<usize> index into the input or an &str/&[u8] substring/subslice (or () when you don't care about the spans). It might be possible to then also have an API like fn to_diag_with_map(DataItem) -> (String, DataItem) that generates a copy of the AST with spans as if it had parsed the output String (or worst case you just encode then re-parse to generate this new AST).

Being able to mutate the input string and AST while retaining correct spans seems very complicated to do (and even more so when adding in a second input string that is expected to produce the same AST, other than the part that has just been modified). I think it would be possible to build a two-pane interactive editor with the above API by walking both ASTs in parallel to match up the items, it'd be expensive from doing re-encoding on every edit, but with the expected sort of document sizes (at least that I've seen) it should be fast enough.

chrysn · 2023-02-15T15:02:26Z

Sounds viable -- and yes, no need to mutate, whoever edits documents so large that's a problem probably doesn't do it in this way.

The walking-both-ASTs-in-parallel part is what scares me most about it. But maybe cbor-diag-rs could provide an iterator for parallel walking, which asserts that the trees are like-shaped.

That walking-like-shaped-trees mechanism might, by the way, also benefit processes such as back-annotation (We have a DN with comments and that like, turned it into CBOR, and now get an edited CBOR which we'd like to see in the same structure), and possibly semantic diffs. Both are out of scope as I understand, but might be related enough to lay some ground work for if it so happens.

chrysn mentioned this issue Jun 14, 2023

Colorize output #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source-map style position annotation #133

Source-map style position annotation #133

chrysn commented Feb 15, 2023

Nemo157 commented Feb 15, 2023

chrysn commented Feb 15, 2023

Source-map style position annotation #133

Source-map style position annotation #133

Comments

chrysn commented Feb 15, 2023

Nemo157 commented Feb 15, 2023

chrysn commented Feb 15, 2023