You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Modern sanitizers have done a lot to make C/C++ development tolerable. Especially with the requirement to move all external scanners to C, I believe it would be incredibly useful to add a -sanitize flag to the tree-sitter CLI which compiles & links all parser/scanner code against every sanitizer under the sun then uses that binary to run the subcommand (most usefully parse and test). This is certain to unmask various memory leaks and undefined behavior existing in external scanners, as I can tell you from personal experience.
Expected behavior
I use sanitizers to test the grammar I develop, which has a fairly complicated external scanner. In lieu something like the proposed -sanitize flag I instead created a small C++ shim (adapted from the fuzzer shim) that just loads a single file into the tree-sitter C library (itself also specially compiled with sanitizers) then parses it. You can see the build script & shim here, which requires the tree-sitter repo be present as a git submodule.
I believe the -sanitize flag would unlock these powerful C/C++ development tools for people who don't want to futz around with compiling & linking things themselves. In particular, it would let people run their own test corpus instead of being restricted to parsing a single file like I am with my simple C++ shim.
There are a number of difficulties I foresee when implementing this feature, because I have experienced them myself - but hopefully someone here has a much higher frustration threshold for debugging cross-platform C/C++ linking issues than I do, and can overcome them:
The CLI is a rust program loading native binaries, and I had trouble successfully loading address sanitizers from it when I tried; see this stackoverflow question I made
There is a fair amount of cross-platform inconsistency with sanitizers. My setup only works on linux, but for some reason sometimes segfaults on the github Linux CI machines (see the logs of the Corpus Tests (Linux) step here); it also possibly(?) works on macOS although errors about being unable to find certain library symbols are also spit out. Allegedly Windows has sanitizers in MSVC but I've never even tried to use them.
Ideally people would be able to step through their scanner in a debugger, but I don't know how well gdb handles the transition from the CLI into the C/C++ code (never tried).
There is a fair amount of risk this feature would generate a lot of "doesn't work on my machine" type bugs but also its value is undeniable, especially if everything is transitioning to C.
The text was updated successfully, but these errors were encountered:
Yeah I was thinking about adding a --fuzz/--harden flag that encompasses this and fuzzing edits for random mutations which can find (and has found some before!) hidden bugs when we have a properly structured tree to work with instead of random bytes fed in with libfuzzer
Problem
Modern sanitizers have done a lot to make C/C++ development tolerable. Especially with the requirement to move all external scanners to C, I believe it would be incredibly useful to add a
-sanitize
flag to the tree-sitter CLI which compiles & links all parser/scanner code against every sanitizer under the sun then uses that binary to run the subcommand (most usefullyparse
andtest
). This is certain to unmask various memory leaks and undefined behavior existing in external scanners, as I can tell you from personal experience.Expected behavior
I use sanitizers to test the grammar I develop, which has a fairly complicated external scanner. In lieu something like the proposed
-sanitize
flag I instead created a small C++ shim (adapted from the fuzzer shim) that just loads a single file into the tree-sitter C library (itself also specially compiled with sanitizers) then parses it. You can see the build script & shim here, which requires the tree-sitter repo be present as a git submodule.I believe the
-sanitize
flag would unlock these powerful C/C++ development tools for people who don't want to futz around with compiling & linking things themselves. In particular, it would let people run their own test corpus instead of being restricted to parsing a single file like I am with my simple C++ shim.There are a number of difficulties I foresee when implementing this feature, because I have experienced them myself - but hopefully someone here has a much higher frustration threshold for debugging cross-platform C/C++ linking issues than I do, and can overcome them:
gdb
handles the transition from the CLI into the C/C++ code (never tried).There is a fair amount of risk this feature would generate a lot of "doesn't work on my machine" type bugs but also its value is undeniable, especially if everything is transitioning to C.
The text was updated successfully, but these errors were encountered: