Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widen language support to all languages supported by tree-sitter #152

Open
spartanatreyu opened this issue Jul 18, 2021 · 13 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers
Projects

Comments

@spartanatreyu
Copy link

Tree-sitter states that it has "fairly complete" support for 34 languages with 12 more in development.

Right now this project states its support for 11 languages. Is there a way we can increase this number?

@spartanatreyu spartanatreyu added the enhancement New feature or request label Jul 18, 2021
@afnanenayet
Copy link
Owner

Yeah it's actually pretty easy to add new tree sitter grammars (or at least, it should be). I have them set up as submodules, and I compile the tree sitter libraries + link against them + generate bindings in the build.rs file. I'm happy to expand support myself sometime this week, but also very open to someone else taking a crack at this

@spartanatreyu
Copy link
Author

Gah, actually looks simple enough but I'm not a rust dev >_<.

I won't have time to learn a new language and learn a new set of build tools for months.

Even though git-submodules are a little niche and takes a little research and trial and error to get right, I wonder if this task could be tagged as good first issue...

@bar9
Copy link
Contributor

bar9 commented Jul 20, 2021

I'd like to see this work with JSON. I'll try to add a json grammar and see where we can go from there.

@spartanatreyu
Copy link
Author

I didn't see a "JSON" parser, but I'm guessing it could be understood by either the JavaScript, TypeScript, or YAML parsers.

@afnanenayet
Copy link
Owner

It seems like all JSON theoretically should be valid Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON

@afnanenayet
Copy link
Owner

@bar9 The process to add a grammar:

  1. Add the tree-sitter submodule to the project
  2. Add an entry in the build script to the submodule, make sure that the name is unique amongst all the other grammars
  3. Ensure tests pass (there is a test to make sure that tree-sitter can actually load the parsers)
  4. Test by running the program on a javascript/JSON file

The function that compiles the grammar:

diffsitter/build.rs

Lines 59 to 83 in 2e916d0

fn compile_grammar(
include: &Path,
c_sources: &[PathBuf],
cpp_sources: &[PathBuf],
output_name: &str,
) -> Result<(), cc::Error> {
if !cpp_sources.is_empty() {
cc::Build::new()
.cpp(true)
.include(include)
.files(cpp_sources)
.warnings(false)
.flag_if_supported("-std=c++14")
.try_compile(&format!("{}-cpp-compile-diffsiter", &output_name))?;
}
if !c_sources.is_empty() {
cc::Build::new()
.include(include)
.files(c_sources)
.warnings(false)
.try_compile(&output_name)?;
}
Ok(())
}
. All the grammars I've seen have C or C++ that create libraries.

An entry for a grammar:

diffsitter/build.rs

Lines 154 to 158 in 2e916d0

GrammarCompileInfo {
display_name: "php",
path: PathBuf::from("grammars/tree-sitter-php"),
c_sources: vec!["parser.c"],
cpp_sources: vec!["scanner.cc"],

The other code in the build.rs file exists mostly to do some codegen to create the functions to load the parsers.

I should probably create a contributing.md file with these instructions

@spartanatreyu
Copy link
Author

spartanatreyu commented Jul 21, 2021

It seems like all JSON theoretically should be valid Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON

Parsing JavaScript should also parse JSON, since JSON is a part of JavaScript, but I also suggested TypeScript because it is a superset of JavaScript, and YAML because it is a superset of JSON, so both should understand JSON and other things beyond JSON simultaneously.

@afnanenayet afnanenayet added the good first issue Good for newcomers label Jul 21, 2021
@bar9
Copy link
Contributor

bar9 commented Jul 21, 2021

@afnanenayet thanks for the hints, I'm just trying the workflow. I guess we can also have a look at this: https://github.com/tree-sitter/tree-sitter-json. Might be faster than using a TS/JS parser, since fewer matches have to be considered. Supporting JSON should be a priority this project. If it is fast, the potential uses are endless: In redux-like stores for deep diffing complex state changes, in/near databases for history of embedded JSON docs, ...

@bar9
Copy link
Contributor

bar9 commented Jul 21, 2021

Just a detail: YAML is no superset of JSON, but structurally equivalent. This means the resulting AST is the same, but you certainly need completely different parsers (JSON uses parentheses for hierarchy, YAML uses indentation)

@afnanenayet
Copy link
Owner

I didn't know YAML and JSON are structurally equivalent, that's neat. And yeah, super happy to have any help at all! I'm also not particularly attached to the build script so if you see a better way to set this up I'm all ears. I've always thought it was a bit messy to be honest.

@bar9 bar9 mentioned this issue Jul 24, 2021
@bar9
Copy link
Contributor

bar9 commented Jul 24, 2021

So thanks again for the instructions, the build works like a charm. However I don't think the diff is doing what we would expect it to do. E.g. if i have a test1.json with the contents:

{
    "hello": "world"
}

and a test2.json with the contents:
{ "hello": "world2" }
The output is

old.json -> new.json
====================

0:
--
+ { "hello": "world2" }

1:
--
-     "hello": "world"

This is just a diff by line. However the curly braces are just different in whitespace, they should not appear in the diff. Do I need to do something else, e.g configure parser tokens?

@afnanenayet
Copy link
Owner

No you don't (or at least, shouldn't) have to do any configuration on your own beyond just adding a grammar

@0xdevalias
Copy link
Contributor

Running diffsitter 0.8.1 on macOS via homebrew:

⇒ diffsitter --version
diffsitter 0.8.1

It lists support for typescript / tsx:

⇒ diffsitter list
This program was compiled with support for:
- bash
- c_sharp
- cpp
- css
- go
- hcl
- java
- json
- ocaml
- php
- python
- ruby
- rust
- tsx
- typescript

Yet by default, it will fail to run against a JavaScript file:

⇒ git difftool --tool diffsitter HEAD~1 HEAD -- unpacked/_next/static/\[buildHash\]/_buildManifest.js
Error: Unsupported file type with no fallback command specified.

Until a file-association override is added to the config (${XDG_HOME:-$HOME}/.config/diffsitter/config.json5):

// ..snip..
  "grammar": {
    "dylib-overrides": null,
    "file-associations": {
      "js": "typescript",
      "jsx": "tsx"
    },
  },
// ..snip..

This would seem like a useful thing to be included in the default config that diffsitter uses:

⇒ diffsitter dump-default-config
{
  "file-associations": null,
  "formatting": {
    "default": "unified",
    "unified": {
      "addition": {
        "highlight": null,
        "regular-foreground": "green",
        "emphasized-foreground": "green",
        "bold": true,
        "underline": false,
        "prefix": "+ "
      },
      "deletion": {
        "highlight": null,
        "regular-foreground": "red",
        "emphasized-foreground": "red",
        "bold": true,
        "underline": false,
        "prefix": "- "
      }
    },
    "json": {
      "pretty_print": false
    },
    "custom": {}
  },
  "grammar": {
    "dylib-overrides": null,
    "file-associations": null
  },
  "input-processing": {
    "split-graphemes": true,
    "exclude-kinds": null,
    "include-kinds": null
  },
  "fallback-cmd": null
}

I also noted that modifying that default config to add the file-associations to the root key didn't seem to work; it only seemed to work when I added them to the grammar version of file-associations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
No open projects
Development

No branches or pull requests

4 participants