WIP: Several changes #9

strangeglyph · 2022-03-02T14:47:05Z

Hi @cbourjau, I found alice-rs when I was looking to write a little command-line TBrowser-like tool in Rust. Sadly it wasn't able to open my root files, and long story short, here's a big chunk of changes:

Make sure the projects compile on newer rust versions (in particular, this means not using rustfmt as a library, as it doesn't seem to compile on modern nightlies. We now assume that rustfmt is provided as a tool by the system)
Version bumps of several libraries, and get rid of several unnecessarily specific patch-level version pins.

More importantly, in root-io, where I did most my work:

Bump nom to v7
That means getting rid of the macros and using the function-based parsers instead
Include nom-supreme for its postfix combinators
Semantic error types
Pretty printing of errors! Look at it:

While trying to parse tbranch object:
00000180	00 01 0b 40 00 02 35 00 0a 40 00 01 e6 00 0c 40 	...@..5..@..�..@
        	                     ^~~~~~~~~~~~~~~~~~~~~~~~~~
00000190	00 00 1a 00 01 00 01 00 00 00 00 03 50 00 00 06 	............P...
000001a0	72 75 6e 4e 75 6d 06 72 75 6e 4e 75 6d 40 00 00 	runNum.runNum@..
000001b0	06 00 02 00 00 03 e9 00 00 00 01 00 00 a0 00 00 	......�......�..
000001c0	00 00 00 00 00 00 01 00 00 00 00 00 00 00 0a 00 	................
000001d0	00 00 00 00 00 00 0a 00 00 00 62 00 00 00 00 00 	..........b.....
000001e0	00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 	................
000001f0	00 00 73 00 00 00 00 00 00 00 63 40 00 00 15 00 	..s.......c@....
00000200	03 00 01 00 00 00 00 03 00 00 00 00 00 00 00 00 	................
00000210	00 00 00 00 40 00 00 6e 00 03 00 01 00 00 00 00 	....@..n........
00000220	03 00 00 00 00 00 00 00 01 00 00 00 00 40 00 00 	.............@..
00000230	55 ff ff ff ff 54 4c 65 61 66 45 6c 65 6d 65 6e 	U����TLeafElemen
00000240	74 00 40 00 00 40 00 01 40 00 00 32 00 02 40 00 	t.@..@..@..2..@.
00000250	00 1a 00 01 00 01 00 00 00 00 03 00 00 00 06 72 	...............r
00000260	75 6e 4e 75 6d 06 72 75 6e 4e 75 6d 00 00 00 01 	unNum.runNum....
00000270	00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 	................
00000280	00 01 00 00 00 0d 40 00 00 1d 00 03 00 01 00 00 	......@.........
        	[0x135 bytes omitted]

Expected EOF, but found excess data
00000370	00 00 00 14 62 61 63 6f 6e 68 65 70 3a 3a 54 45 	....baconhep::TE
        	         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00000380	76 65 6e 74 49 6e 66 6f 14 62 61 63 6f 6e 68 65 	ventInfo.baconhe
00000390	70 3a 3a 54 45 76 65 6e 74 49 6e 66 6f 00 cd 50 	p::TEventInfo.�P
000003a0	8e 09 00 07 00 00 00 01 00 00 00 00 00 00 00 0d 	�...............
000003b0	00 00 00 00 00 00 00 00 00 00 00 00 40 00 02 30 	............@..0
        	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is some errors in the test-suite I haven't managed to get rid of yet though, so this PR remains as a WIP for now. Could you point me at the actual documentation of the root file format? I haven't been able to find it online.

cbourjau · 2022-03-02T18:55:12Z

Cool! I'll take a closer look in the next few days. Concerning documentation I had the same problem back in the days. I recommend:

This post of me asking the same question: https://root-forum.cern.ch/t/documentation-for-roots-file-format/26260/2
The groot project (links from the above thread are dead; the projects go merged into one repo): https://pkg.go.dev/go-hep.org/x/hep/groot?utm_source=godoc

cbourjau · 2022-03-13T10:38:10Z

@strangeglyph I finally found the time to take a closer look at this PR. Thanks again! Clearly, there is a lot going on in this PR and I think it would be easier if it were broken up in smaller chunks that are known to work. Would it be possible to break it up into the following:

Replace rustfmt with prettyplease
Remove version pins
Update nom to v7 and remove macros (might be a smoother transition to convert macros to functions on the current version an then update)

Once that is done and known to work we can look at the next steps. If we touch the error handling of this project it, I think it would be nice to replace the failure crate completely with the current best practices in error handling (many of the failure features have made it to the standard library in the mean time). I have to admit that I am not completely up to date in this regard, but I believe thiserror fits that bill?

The postfix operators of nom-supreme seem nice, but I am hesitant to pull in a dependency with fairly small adoption in the wider ecosystem just for that. Is nom-supreme also needed for the pretty error messages?

Is it reasonable to break up this PR along those lines?

Edit:
I converted this to a bit of a tracking issue. Your comments about failure/thiserror make perfect sense!

strangeglyph · 2022-03-27T21:11:54Z

Breaking up the PR absolutely makes sense, I'll get around to doing that soon. prettyplease looks like a good find, I'll see if it can be easily integrated.

I have tried to replace failure everywhere I could find it in root-io. I think it's reasonable to replace it in the other crates as well. thiserror is what I've used in root-io so far and I believe it's the current standard for library-level errors. Speaking of, might be worth splitting this whole repo into several, one per crate at some point?

nom-supreme is only needed for the postfix combinators. I was looking into its error types as well, but they aren't particularly suited to what we want (in particular, some error types are only implemented over string inputs, not bytes). However, I can only stress how much the postfix notation improves readability. In my opinion, it would be worth keeping it in.

cbourjau · 2022-05-24T23:12:00Z

Are you still interested in pushing this PR forward? I think it would be very nice to see these changes land and I am also warming up to nom-supreme :D

strangeglyph · 2022-05-25T07:20:19Z

Ah, yes, sorry. I've been a bit busy with life the last couple of weeks but I should hopefully find more time for this project again soon.

strangeglyph · 2022-05-29T20:24:46Z

Got around to investigating these bugs a bit more. Looks like we don't actually parse everything from a tbranch blob? I just verified this happened before the nom upgrades. Is this intended behavior? I had wrapped the call in tbranch_hdr with all_consuming, but if this isn't actually intended to be read in its entirety, I'll remove it.

cbourjau · 2022-05-29T20:38:37Z

I was actually just looking into this, too! I think it would be nice to use eof or all_consuming as much as possible, but as it happens, there are sometimes unexpected paddings or mystery bytes in the ROOT format which make this difficult...

strangeglyph · 2022-05-29T21:20:01Z

I think I wrapped all_consuming everywhere where we pass around Raw buffers and the like. I'll remove it from tbranch_hdr for now and make a note on tbranch (and open an issue?) about the case of the mysterious bytes.

strangeglyph · 2022-06-07T10:30:16Z

So, this version passes the test suite and as far as I can tell matches the behavior of the current master branch. No clue why the AppVeyor build is failing. The entire output just seems to be a big diff, but I can't see an error message.

cbourjau · 2022-06-19T22:33:23Z

Sorry for the late reply. sounds like a rustfmt issue. I'll take a look!

cbourjau · 2022-07-04T20:33:51Z

Alright! things were indeed just a cargo fmt to get everything formatted. That said, I am afraid that this PR is still too large and has a few things where I am not quite sure about their value given their complexity. Would you be interested in spinning out the nom update thiserror error handling (without nom-locate and ouroboros) into a separate PR?

strangeglyph · 2022-07-04T20:38:18Z

Glad to hear that it was just cargo fmt - I'll remember that in the future. Let me make sure I understand you correctly, you'd like two PRs:

PR 1 contains the nom v7 updates and replaces failure with thiserror
PR 2 has the improved error reporting with nom-locate?

cbourjau · 2022-07-05T16:39:38Z

Yes, pretty much. I have to say that I am not quite sold on the merits of nom-local and ouroboros, though (given their added complexity and dependencies). This is one of the reasons why I think it would be nice to see them in isolation. Just by the way: are you aware of dbg_dmp in nom? It is what I usually use for debugging. Certainly, it is nicer to automatize printing such an output when something goes wrong, though.

strangeglyph mentioned this pull request Apr 3, 2022

Rustfmt to prettyplease #10

Merged

cbourjau mentioned this pull request Apr 15, 2022

Remove failure crate in favor of thiserror #12

Open

strangeglyph force-pushed the master branch 2 times, most recently from f659bc2 to 104e407 Compare April 17, 2022 19:48

strangeglyph mentioned this pull request Apr 17, 2022

Version bumps #11

Merged

cbourjau mentioned this pull request Apr 18, 2022

Remove nom macros #14

Open

lschuetze added 5 commits June 5, 2022 22:28

2022 update

99420a3

Bug fixes and polishing

2b14b97

Bump root-io version to account for signature changes

47ddeaf

Changes to make current behavior consistent with old root-io behavior

cabf900

Merge upstream/master

e8eba83

strangeglyph force-pushed the master branch from 104e407 to e8eba83 Compare June 7, 2022 09:49

cbourjau added 2 commits June 29, 2022 10:26

Run cargo fmt on this PR

afc4e58

Remove unused unstable feature

0b849c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Several changes #9

WIP: Several changes #9

strangeglyph commented Mar 2, 2022

cbourjau commented Mar 2, 2022

cbourjau commented Mar 13, 2022 •

edited

strangeglyph commented Mar 27, 2022

cbourjau commented May 24, 2022

strangeglyph commented May 25, 2022

strangeglyph commented May 29, 2022

cbourjau commented May 29, 2022

strangeglyph commented May 29, 2022

strangeglyph commented Jun 7, 2022

cbourjau commented Jun 19, 2022

cbourjau commented Jul 4, 2022

strangeglyph commented Jul 4, 2022

cbourjau commented Jul 5, 2022

WIP: Several changes #9

Are you sure you want to change the base?

WIP: Several changes #9

Conversation

strangeglyph commented Mar 2, 2022

cbourjau commented Mar 2, 2022

cbourjau commented Mar 13, 2022 • edited

strangeglyph commented Mar 27, 2022

cbourjau commented May 24, 2022

strangeglyph commented May 25, 2022

strangeglyph commented May 29, 2022

cbourjau commented May 29, 2022

strangeglyph commented May 29, 2022

strangeglyph commented Jun 7, 2022

cbourjau commented Jun 19, 2022

cbourjau commented Jul 4, 2022

strangeglyph commented Jul 4, 2022

cbourjau commented Jul 5, 2022

cbourjau commented Mar 13, 2022 •

edited