Skip to content

Latest commit

 

History

History
100 lines (61 loc) · 4.78 KB

CONTRIBUTING.md

File metadata and controls

100 lines (61 loc) · 4.78 KB

Project structure

src/
  - index.js --> the "compiler"
  - tokenizer/ --> reads strings returns tokens
  - parser/  --> reads tokens, returns ast
  - semantics/ --> reads ast, returns a typed ast
  - validation/ --> validations an ast for correctness
  - generator/ --> generates IR for emitter from a typed AST
  - emitter/ --> reads ast, returns Web Assembly binary encoding
    - section/ --> contains mini-emitters for wasm binary sections
  - utils/
    - stream.js --> basic string stream object used in token generation
    - token-stream.js --> object used to hold tokens generated by tokenizer
    - output-stream.js --> object used to hold binary and dissasembler data
dist/ --> build
docs/ --> .io site with Explorer/Playground

Compiler Phases

To be able to edit the compiler it is important to understand where a change should be made. The compiler could be visualized as a chain of individual immutable operations. Each one taking a specific data type (this usually being a Node) and returning a new data structure. The one exception is the Validation step which is an identify function which may throw an Exception.

From left-to-right:

Source -> Parse -> Semantics -> Validate(identity) -> Generate -> Emit -> Binary

Phase 1 - Parsing

1.A - Tokenizing

Before an AST is generated the source is divided into atomic Tokens. The Tokens are then converted into an Abstract Syntax Tree. The tokenizing process is stable and there are no known bugs.

1.B - Base AST Generation

The initial pass over the Tokens which becomes the base Abstract Syntax Tree for the compiler. This Tree contains no type information. This tree structure is usually much lighter and more high-level representing abstract ideas or the intent of the source program.

Only the basic syntactical checks are pefromed in this phase. Syntax errors may be thrown here.

Phase 2 - Semantics

The tree is mapped in this phase, producing a new tree structure which may be used to generate a valid binary. In this phase the relationships between functions, types and variables are assigned. High-level representations are reduced to lower-level operations represented as AST nodes. This is a much more detailed and larger tree than the one in Phase 1. This tree structure can be thought of as representing the WebAssembly equivalent of the walt source code.

Phase 3 - Validation

Validation is the final step before the AST is used to generate a binary. The connections between AST Nodes assigned in Phase 2 are sanity checked here for errors. This area could use the most help.

Phase 4 - Generator

In this phase we flatten and map the Typed AST to generate the Intermediate Representation (IR) in a form of a Program object. The Program represents the data, imports, exports and instructions generated from our source. Later to be consumed by the emitter.

Phase 5 - Emitter

Pure function which takes the Program generated by the parsing process and converts it to binary. Uses the output-stream. This is the land of Web Assembly spec. Not many chnages need to happen here as the semantics and generator do the heavy lifting of converting the source code into an emit-able Program.

Developing

Requirements

  • node 8

Node 8+ has native Web Assembly support and this project takes full advantage of that fact.

Commands

Tests

Every piece of the compiler is unit tested. AVA is used as the test runner.

  • npm run tdd
  • npm run tdd -- --watch
  • npm run tdd -- --watch <spec_file_path>

Do not confuse with npm test. The npm test command is used for CI integration and generates full code coverage reports. You may still use it if you are interested in coverall report.

To debug a spec

  • npm run debug -- <spec_file_path>

Helpful APIs and notes:

  • prettyPrint(ast) may be used to debug AST issues. Pretty-print works on both first pass tree and the full semantic AST with type information.
  • debug(output-buffer) may be used to examine wasm opcode output
  • String(ast-node) most AST nodes may be coerced to a string to retrieve the original source represented. Most because there are some nodes which are hand-crafted and may not contain any source-code equivalent.
  • parser/fragment.js module contains fragment methods to generate AST fragments or Nodes. Fragments do not require a full program to generate, they may be created from a snippet of code. Fragments may only be generated from an expression or a statement. Fragments are valid AST Nodes and may be debugged with methods mentioned above.

Pull Requests

  • 100% statement coverage must be maintained. A PR with a lowered coverage will not be accepted.
  • UI/Explorer changes may require a screenshot and or running demo.

Build

If you'd like to see your changes reflected in the explorer page, run the npm run build command.