Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: annotate source with token/character counts for functions #12

Open
sparr opened this issue May 21, 2023 · 6 comments · May be fixed by #14
Open

Feature proposal: annotate source with token/character counts for functions #12

sparr opened this issue May 21, 2023 · 6 comments · May be fixed by #14

Comments

@sparr
Copy link

sparr commented May 21, 2023

I want to add per-function token and character counts, pre and post minification, to https://github.com/sparr/pico8lib. My initial idea was to use a script to parse the functions out of a source file, then feed them separately to shrinko8 -c, parse the output, prepend it as a comment to the function, then reassemble the source file. However, it occurs to me that a lot of the necessary functionality to do this better already exists in this project.

My proposal is a mode for shrinko8 that does all of this directly. Parse a cart, identify all the functions, count their tokens and characters, minify them, count those tokens and characters, insert a comment into the original cart above the function (possibly before existing comments, and overwriting a previous comment from this process) and emit the annotated original cart.

I'm willing to do most of the work to make this happen, but I find the code here a bit daunting. I've made it to process_code where it calls tokenize then parse, and I can look through Nodes under root, but I am not sure how to get back from a Node to tokens that count_tokens wants, or how to turn a Node back into a cart after I've modified some of its grandchildren. Generally moving back and forth between the source, tokens, and nodes, in a way that keeps them mapped to each other for modifying one based on the other, is a mystery to me so far. I would appreciate any pointers you can share before I get started on this. Even if this isn't a welcome feature for the main repo, I'd still like to pursue it as a fork or patch so I can use it myself.

@thisismypassport
Copy link
Owner

thisismypassport commented May 21, 2023

Currently, the minification steps modify both the node tree and tokens array. I'm not too happy about this either, and will probably improve this as part of a branch I'm (potentially) working on.

Either way, you can traverse over the tokens in a node tree via traverse_tokens and add them to your own array, then count that.

I'm not sure how easy it is to split the code and minify each function separately - especially if you want to do renaming, as that's an inherently global operation. It might be better to minify the whole thing at once.

Currently, minify_code converts a node tree back to source at the end - using two approaches depending on whether minify_wspace was requested. Splitting this logic into a separate function would definitely be nice. (I may also do this in my branch)

However, it sounds like you don't want to output minified code, just lightly modified code. The logic to do so would likely be somewhat similar (in structure) to the bottom else case of minify_code (minus the complications) - traverse over tokens, append their contents and the wspace between them, except with changes to the comments as needed. (The comments are part of the wspace between tokens and aren't captured explicitly in the node tree or the token list)

@sparr
Copy link
Author

sparr commented May 27, 2023

Below are a simple cart I'm testing with and a jsonpickled representation of root. I can see the token that represents the function keyword, but I can't see any obvious way to capture or manipulate the whole function definition (from function to end). It feels like I'm missing some layer of the puzzle here.

pico-8 cartridge // http://www.pico-8.com
version 18
__lua__
-- comment
function funcname()
 return true
end
{
  "children": [ {
    "body": { "py/id": 25 },
    "children": [
      {
        "children": { "py/tuple": [] },
        "endidx": 19,
        "idx": 11,
        "modified": false,
        "parent": { "py/id": 2 },
        "py/object": "pico_tokenize.Token",
        "source": {
          "cart": {
            "code": "-- comment\nfunction funcname()\n return true\nend",
            "code_map": [ {
              "py/newargs": { "py/tuple": [ { "py/tuple": [ 0, "test.p8", "-- comment\nfunction funcname()\n return true\nend", 0, 3 ] } ] },
              "py/object": "pico_cart.CodeMapping",
              "py/seq": [ 0, "test.p8", "-- comment\nfunction funcname()\n return true\nend", 0, 3 ]
            } ],
            "code_rom": null,
            "meta": { "default_factory": { "py/type": "builtins.list" }, "py/object": "collections.defaultdict" },
            "name": "test.p8",
            "platform": "l",
            "py/object": "pico_cart.Cart",
            "rom": { "py/object": "pico_defs.Memory" },
            "screenshot": null,
            "version_id": 18,
            "version_tuple": { "py/tuple": [ 0, 1, 23, 0 ] }
          },
          "mappings": { "py/id": 9 },
          "py/object": "pico_process.CartSource"
        },
        "type": { "py/object": "pico_tokenize.TokenType", "value": 3 },
        "value": "function",
        "vline": 1
      }, {
        "children": [ {
          "children": { "py/tuple": [] },
          "endidx": 28,
          "idx": 20,
          "modified": false,
          "parent": { "py/id": 12 },
          "py/object": "pico_tokenize.Token",
          "source": { "py/id": 6 },
          "type": { "py/object": "pico_tokenize.TokenType", "value": 2 },
          "value": "funcname",
          "vline": 1
        } ],
        "endidx": 28,
        "idx": 20,
        "kind": { "py/object": "pico_parse.VarKind", "value": 1 },
        "name": "funcname",
        "new": false,
        "parent": { "py/id": 2 },
        "parent_scope": {
          "depth": -1,
          "funcdepth": 0,
          "items": { "_ENV": { "implicit": true, "keys_kind": null, "name": "_ENV", "py/object": "pico_parse.Local", "scope": { "py/id": 19 } } },
          "parent": null,
          "py/object": "pico_parse.Scope"
        },
        "py/object": "pico_parse.Node",
        "scopespec": null,
        "source": { "py/id": 6 },
        "type": { "py/object": "pico_parse.NodeType", "value": 0 },
        "value": null,
        "var": { "keys_kind": null, "name": "funcname", "py/object": "pico_parse.Global" },
        "var_kind": null
      }, {
        "children": { "py/tuple": [] },
        "endidx": 29,
        "idx": 28,
        "modified": false,
        "parent": { "py/id": 2 },
        "py/object": "pico_tokenize.Token",
        "source": { "py/id": 6 },
        "type": { "py/object": "pico_tokenize.TokenType", "value": 4 },
        "value": "(",
        "vline": 1
      }, {
        "children": { "py/tuple": [] },
        "endidx": 30,
        "idx": 29,
        "modified": false,
        "parent": { "py/id": 2 },
        "py/object": "pico_tokenize.Token",
        "source": { "py/id": 6 },
        "type": { "py/id": 23 },
        "value": ")",
        "vline": 1
      }, {
        "children": [ {
          "children": [
            {
              "children": { "py/tuple": [] },
              "endidx": 38,
              "idx": 32,
              "modified": false,
              "parent": { "py/id": 27 },
              "py/object": "pico_tokenize.Token",
              "source": { "py/id": 6 },
              "type": { "py/id": 5 },
              "value": "return",
              "vline": 2
            }, {
              "children": [ {
                "children": { "py/tuple": [] },
                "endidx": 43,
                "idx": 39,
                "modified": false,
                "parent": { "py/id": 30 },
                "py/object": "pico_tokenize.Token",
                "source": { "py/id": 6 },
                "type": { "py/id": 5 },
                "value": "true",
                "vline": 2
              } ],
              "endidx": 43,
              "idx": 39,
              "parent": { "py/id": 27 },
              "py/object": "pico_parse.Node",
              "scopespec": null,
              "source": { "py/id": 6 },
              "token": { "py/id": 32 },
              "type": { "py/object": "pico_parse.NodeType", "value": 3 },
              "value": null
            }
          ],
          "endidx": 43,
          "idx": 32,
          "items": [ { "py/id": 30 } ],
          "parent": { "py/id": 25 },
          "py/object": "pico_parse.Node",
          "scopespec": null,
          "source": { "py/id": 6 },
          "type": { "py/object": "pico_parse.NodeType", "value": 24 },
          "value": null
        } ],
        "endidx": 43,
        "idx": 32,
        "parent": { "py/id": 2 },
        "py/object": "pico_parse.Node",
        "scopespec": { "py/tuple": [ false, [] ] },
        "source": { "py/id": 6 },
        "stmts": [ { "py/id": 27 } ],
        "type": { "py/object": "pico_parse.NodeType", "value": 29 },
        "value": null
      }, {
        "children": { "py/tuple": [] },
        "endidx": 47,
        "idx": 44,
        "modified": false,
        "parent": { "py/id": 2 },
        "py/object": "pico_tokenize.Token",
        "source": { "py/id": 6 },
        "type": { "py/id": 5 },
        "value": "end",
        "vline": 3
      }
    ],
    "endidx": 47,
    "idx": 11,
    "kind": null,
    "name": "funcname",
    "params": [],
    "parent": { "py/id": 0 },
    "py/object": "pico_parse.Node",
    "scopespec": { "depth": 1, "funcdepth": 1, "items": {}, "parent": { "py/id": 19 }, "py/object": "pico_parse.Scope" },
    "source": { "py/id": 6 },
    "target": { "py/id": 12 },
    "type": { "py/object": "pico_parse.NodeType", "value": 15 },
    "value": null
  } ],
  "endidx": 47,
  "globals": { "__dict__": {}, "funcname": { "py/id": 18 }, "py/object": "utils.LazyDict" },
  "idx": 11,
  "parent": null,
  "py/object": "pico_parse.Node",
  "scopespec": { "py/tuple": [ false, [] ] },
  "source": { "py/id": 6 },
  "stmts": [ { "py/id": 2 } ],
  "type": { "py/id": 36 },
  "value": null
}

@thisismypassport
Copy link
Owner

Assuming I understood what you're asking for correctly, the idx and endidx attributes specify the range of the function's tokens within the source (e.g. node.source.text[node.idx : node.endidx])

@sparr
Copy link
Author

sparr commented May 27, 2023

OK, so, the Token with "value":"function" is the literal function in the source, spanning idx 11-19.

Its parent is this Node:

{
    "body": { "py/id": 25 },
    "children": [ "CUT" ],
    "endidx": 47,
    "idx": 11,
    "kind": null,
    "name": "funcname",
    "params": [],
    "parent": { "py/id": 0 },
    "py/object": "pico_parse.Node",
    "scopespec": { "depth": 1, "funcdepth": 1, "items": {}, "parent": { "py/id": 19 }, "py/object": "pico_parse.Scope" },
    "source": { "py/id": 6 },
    "target": { "py/id": 12 },
    "type": { "py/object": "pico_parse.NodeType", "value": 15 },
    "value": null
  }

which spans idx 11-47. That seems like everything from function to end. That Node has "kind": null and "name": "funcname" (because it's a function with that name, not to be confused with the sibling of that earlier token that has "value": "funcname" and is the literal funcname from the source). It also has "scopespec": { "depth": 1, "funcdepth": 1, ... which seems to describe some stuff about the function. Is there any way to directly identify that this Node represents a function, or do I need to duck type it based on the presence of .scopespec.funcdepth or similar?

The parent of that Node is another Node which has the same idx/endidx but different properties. I think it might represent the whole source, minus any leading or trailing comments, although I'll need to experiment more to figure that out.

@sparr
Copy link
Author

sparr commented May 27, 2023

Aha!
"type": { "py/object": "pico_parse.NodeType", "value": 15 },

pico_parse.NodeType.values[15] is "function"!

@sparr sparr linked a pull request May 28, 2023 that will close this issue
@thisismypassport
Copy link
Owner

Yeah, node.type == NodeType.function

Yeah, the parent of the function is the root node, it indeed spans the whole source minus any leading/trailing whitespace/comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants