Skip to content

Compiler Architecture

Jonathan Aldrich edited this page Sep 18, 2018 · 5 revisions

Structure

Phases

  • Lexer - wyvern.tools.lexer
    • Input from a Reader
    • Output: get/peek interface
    • Tokens, Newline, Indent/Dedent
    • Issues
      • Supporting an escape for newlines
      • Should we ignore newlines inside parentheses? Handled in parsing stage 1.
      • Maintaining comments so they can be associated with the next syntactic construct
      • Adding character numbers
      • Useful to have a TokenStream interface that the Lexer implements
  • Parsing Stage 1 - Parse into an uninterpreted tree (RawAST)
    • Input: Stream of tokens (right now a Lexer, eventually a TokenStream)
    • Output: RawAST
    • Issues
      • Preserve line numbers
  • Parsing Stage 2
    • Input: RawAST
    • Output: TypedAST
    • Features
      • Keyword extensibility and keyword-based parsing
      • Symbol resolution, and types of symbols, is done as part of parsing
        • Issue: need to resolve types of all declarations before parsing any definitions. Type resolution may be recursive and thus must be done lazily/on-demand.
        • Furthermore, resolving the type of a declaration before parsing its definition is not (obviously) compatible with type inference for vals. Can make exceptions for some easy things. Could integrate the parsing of val declarations into the lazy type resolution procedure.

Back end FAQ: why do we support Python 3 (and not Python 2)?

  1. python 2 and 3 had some differences, and at some point it was a pain to support both (see especially #3 below)
  2. python 3 seems to be "winning" in terms of new code people are writing, and
  3. python 2 doesn't have a nice object for builtins that we can import; python 3 makes this much easier.