Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source-map style position annotation #133

Open
chrysn opened this issue Feb 15, 2023 · 2 comments
Open

Source-map style position annotation #133

chrysn opened this issue Feb 15, 2023 · 2 comments

Comments

@chrysn
Copy link
Contributor

chrysn commented Feb 15, 2023

For interactive editing (highlighting cursor positions in a two-paned hex and diagnostic view), or for debugging (implementing pd-body-error-position), it would be cool to match ranges of bytes encoded in CBOR to ranges of bytes encoded in diagnostic notation -- similar to how a compiler outputs debug information matching instructions to source lines.

This tangentially related to #20, as it would pave the way to color-highlighting hex output.

One thing that'll make this relatively hard for this crate is that it's interconverting via a mutable AST (which on its own is great, just needs some more effort here). A relatively easy API would be to turn a CBOR byte string into a DN text string (or vice versa), and also produce a source map as a list of corresponding (frequently nested) ranges. There's probably a design pattern by which the AST can keep cursors in two serializations, but I don't know how to make a pretty API out of it, or how to do it with neither pinning nor Rc'ing nor indices for which it isn't completely clear which slice they relate to.

@Nemo157
Copy link
Member

Nemo157 commented Feb 15, 2023

It would be relatively easy to add spans on the AST pointing back to the parsed input, I had planned to port the parsing to use chumsky at some point which makes it trivial. These spans can be generic and either a Range<usize> index into the input or an &str/&[u8] substring/subslice (or () when you don't care about the spans). It might be possible to then also have an API like fn to_diag_with_map(DataItem) -> (String, DataItem) that generates a copy of the AST with spans as if it had parsed the output String (or worst case you just encode then re-parse to generate this new AST).

Being able to mutate the input string and AST while retaining correct spans seems very complicated to do (and even more so when adding in a second input string that is expected to produce the same AST, other than the part that has just been modified). I think it would be possible to build a two-pane interactive editor with the above API by walking both ASTs in parallel to match up the items, it'd be expensive from doing re-encoding on every edit, but with the expected sort of document sizes (at least that I've seen) it should be fast enough.

@chrysn
Copy link
Contributor Author

chrysn commented Feb 15, 2023

Sounds viable -- and yes, no need to mutate, whoever edits documents so large that's a problem probably doesn't do it in this way.

The walking-both-ASTs-in-parallel part is what scares me most about it. But maybe cbor-diag-rs could provide an iterator for parallel walking, which asserts that the trees are like-shaped.

That walking-like-shaped-trees mechanism might, by the way, also benefit processes such as back-annotation (We have a DN with comments and that like, turned it into CBOR, and now get an edited CBOR which we'd like to see in the same structure), and possibly semantic diffs. Both are out of scope as I understand, but might be related enough to lay some ground work for if it so happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants