PhD thesis code

SPAD language type checker

Objectives

The project aims to deliver a new type checker for SPAD language. Several improvements over current type checker are planned:

introduce better type inference,
introduce modern language constructs,
produce understandable diagnostic messages (currently the compiler dumps AST as s-expression if it encounters an error),
eliminate well known bugs in the type system,
find new type errors in FriCAS library code.

Techniques employed will involve bottom-up tree walking, graph-like structures for type information management, aggressive type pruning.

Deliverables

The project is going to deliver a partial implementation of type checker for Spad language. It'll be a set of source files with accompanying documentation. The code will have a single entry point - type checker function - which will accept parsed source file and infer types as needed. The result of the type checker will be a type annotated AST (with "$" and "@" operators) ready to be fed into existing Spad compiler, or an error message stating (as clearly as possible) why the type checking failed. The new type checker should:

work on existing Spad code in FriCAS library without massive source changes,
handle most commonly used program structures,
detect as many type errors as possible before passing the code to original compiler,
catch errors that go undetected with current type checker,
show that in some cases it doesn't require user delivered annotations, that are otherwise necessary,
make the compilation process faster.

How to compile?

Download FriCAS sources and compile them.
Copy src/interp/ncomp.boot to cloned repository directory.
Run patch < ncomp.boot.diff shell command to patch FriCAS interpreter.
Run FriCAS in the directory.
Inside FriCAS issue command )read initial.input.

If all steps were successful you should be in a mode that let's you run some test the new type checker. To do so issue FriCAS command )compile Test1.spad.

Logging

During type checking various messages are printed out. Each message consist of three fields:

(0.001) [Main] A very important message.
 ^       ^     ^
 |       |     |
 |       |     the beginning of a message (possibly multiline)
 |       component name, colour means logging level
 time in seconds from the moment when compiler entered new type checker code

There are 5 logging levels which roughly correspond in the meaning to unix syslog levels. Component names are coloured to mark importance of message:

level	colour
debug	magenta
info	cyan
notice	green
warn	yellow
fail	red

Logger mechanism is configurable. You can specify for each component which messages will be logged, for instance (snippet from SpadCompilerTest.spad):

loggerDefaultLevel "debug"
loggerLevel('Unify, "info")
loggerLevel('Type, "info")
loggerLevel('Parser, "notice")
loggerLevel('Main, "notice")

Components

name	responsiblities
`Daase`	Communicates and extracts type information from the database of original compiler.
`Env`	Populates and accesses old compiler environment data structure. Makes possible to read all function signatures associated with given type.
`Parser`	Translates between original (SExpression) and type checker internal (SpadNode) representation of abstract syntax trees. Performs additional syntax checks.
`AST`	Abstract syntax tree analyzer, walks over AST and constructs type facts in form of a "blackboard" data structure.
`Unify`	Performs type unification algorithm and various variations of it. Additionaly may check if one type is subtype of another type.
`Type`	Type checker algorithm. Operates on "blackboard" data structure. Infers or verifies types present in a function.

Interpreting output messages

Source AST in internal representation

This should contain the same information as input file, but the printout should be perfomed by internal code. After that an abstract syntax tree analyzer will be run, which in turn will construct a blackboard with type facts.

(0.001) [Main] Tree ready for type checking:
  Test1 () :
      foo : Integer -> Integer
      foo : Float -> Float
      bar : (Integer, Integer) -> Integer
    == ? add
      import Float
      import PositiveInteger
      import NonNegativeInteger
      foo (a : ?) ==
        a2 := (a * a)
        ma := -(a)
        if (a >= 0)
        then
          (ma * 2)
        else
          (a2 + 2)
      baz : (Integer, Integer) -> Integer
      bar (x : ?, y : ?) ==
        nx : Integer
        if (x > 0)
        then
          y := -(10)
          nx := baz(x, y)
        else
          nx := -(x)
        (nx + y)
      baz (x : Integer, y : Integer) : Integer ==
        (x * y)

Blackboard state before type checker run

These two data structures represented by TypeCheckerNodeArray domain express all information that AST phase was able to gather about the function or functor under question (foo : Integer -> Integer in this case).

Nodes printout represents AST with type information and typing rules.

(-) or (+) indicate if type checker finished inferring type for this node,
#n is node number,
there are two possiblities:
- symbol : Type : a leaf - usually a symbol,
- {...} is Type : a node with typing rule attached.

Note that some nodes have a type named %n. These are type variables stored in second data structure. If a given node there're several alternatives available [T1, T2, ...] or the type is unknown ?, a type variable will be assigned for it. Though there're cases where artificial nodes are generated - they don't correspond to any node from AST.

Last things that requires some explanation is rule format. Actually each node can be assigned several typing rules (that happens sometimes - e.g. for :: or implict elt use aka aggregate access). Each rule consist of several components.

Below is current grammar for rules (subject to change):

{r1, r2, ...} : single rule composed of rule components,
{...} | {...} | ... : alternative rules,
#n - node n must be visited before proceeding,
#n <: #m : type assigned to node n is in subtyping relation with node m,
#n := #m : node n will have a (super)type of node m, same for current node.
#n(#x, #y, ...) : function application rule, n has a function type, (x, y, ...) will be subtypes of what can be applied to the function,
#n $ Type : function types assigned to n must originate from Type domain.

There're some special type variables:

%any - represents any type (aka top type) - used to handle error statement,
%undef - undefined type (aka bottom type) assigned to each statement (e.g. if-then without else branch).

Nodes :
  (-)  #1 {#2, #3} is Integer -> Integer
  (-)  #2 a : Integer
  (-)  #3 {#4, #4 <: #3} is Integer
  (-)  #4 {#5, #11, #16, #16 <: #4} is %3
  (-)  #5 {#6, #7, #6 := #7} is %4
  (-)  #6 a2 : %5
  (-)  #7 {#8, #9, #10, #8(#9, #10)} is %6
  (-)  #8 * : %7
  (-)  #9 a : Integer
  (-) #10 a : Integer
  (-) #11 {#12, #13, #12 := #13} is %10
  (-) #12 ma : %11
  (-) #13 {#14, #15, #14(#15)} is %12
  (-) #14 - : %13
  (-) #15 a : Integer
  (-) #16 {#17, #22, #26, if #17 then #22 else #26} is %24
  (-) #17 {#18, #19, #20, #18(#19, #20)} is %15
  (-) #18 >= : (Integer, Integer) -> Boolean $ Integer
  (-) #19 a : Integer
  (-) #20 {#21, #21()} is %18
  (-) #21 Zero : () -> Integer $ Integer
  (-) #22 {#23, #24, #25, #23(#24, #25)} is %20
  (-) #23 * : %21
  (-) #24 ma : %11
  (-) #25 2 : Integer
  (-) #26 {#27, #28, #29, #27(#28, #29)} is %22
  (-) #27 + : (Integer, Integer) -> Integer $ Integer
  (-) #28 a2 : %5
  (-) #29 2 : Integer
Type variables :
    %3 : ?
    %4 : ?
    %5 : ?
    %6 : ?
    %7 : [(Integer, Integer) -> Integer $ Integer, (NonNegativeInteger, Integer) -> Integer $ Integer, (PositiveInteger, Integer) -> Integer $ Integer]
   %10 : ?
   %11 : ?
   %12 : ?
   %13 : [(Integer, Integer) -> Integer $ Integer, Integer -> Integer $ Integer]
   %15 : ?
   %18 : ?
   %20 : ?
   %21 : [(Integer, Integer) -> Integer $ Integer, (NonNegativeInteger, Integer) -> Integer $ Integer, (PositiveInteger, Integer) -> Integer $ Integer]
   %22 : ?
   %24 : [%20, %22, Union(%20, %22), Union(%22, %20)]

Blackboard state after type checker run

This is how a complete type checked tree should look like. For your convenience (+) are coloured in green and (-) in red.

This example doesn't show that, but final tree must not have any rule alternatives in any of nodes.

  (+)  #1 {#2, #3} is Integer -> Integer
  (+)  #2 a : Integer
  (+)  #3 {#4, #4 <: #3} is Integer
  (+)  #4 {#5, #11, #16, #16 <: #4} is Integer
  (+)  #5 {#6, #7, #6 := #7} is Integer
  (+)  #6 a2 : Integer
  (+)  #7 {#8, #9, #10, #8(#9, #10)} is Integer
  (+)  #8 * : (Integer, Integer) -> Integer $ Integer
  (+)  #9 a : Integer
  (+) #10 a : Integer
  (+) #11 {#12, #13, #12 := #13} is Integer
  (+) #12 ma : Integer
  (+) #13 {#14, #15, #14(#15)} is Integer
  (+) #14 - : Integer -> Integer $ Integer
  (+) #15 a : Integer
  (+) #16 {#17, #22, #26, if #17 then #22 else #26} is Integer
  (+) #17 {#18, #19, #20, #18(#19, #20)} is Boolean
  (+) #18 >= : (Integer, Integer) -> Boolean $ Integer
  (+) #19 a : Integer
  (+) #20 {#21, #21()} is Integer
  (+) #21 Zero : () -> Integer $ Integer
  (+) #22 {#23, #24, #25, #23(#24, #25)} is Integer
  (+) #23 * : (Integer, Integer) -> Integer $ Integer
  (+) #24 ma : Integer
  (+) #25 2 : Integer
  (+) #26 {#27, #28, #29, #27(#28, #29)} is Integer
  (+) #27 + : (Integer, Integer) -> Integer $ Integer
  (+) #28 a2 : Integer
  (+) #29 2 : Integer

Type annotated tree passed to original compiler

Original tree passed to new type checker will be rewritten and type annotations will be added.

  Test1 () :
      foo : Integer -> Integer
      foo : Float -> Float
      bar : (Integer, Integer) -> Integer
    == Type add
      foo (a : Integer) : Integer ==
        a2 : Integer := (* $ Integer)(a, a)
        ma : Integer := (- $ Integer)(a)
        if (>= $ Integer)(a, Zero $ Integer)
        then
          (* $ Integer)(ma, 2)
        else
          (+ $ Integer)(a2, 2)
      baz : (Integer, Integer) -> Integer
      bar (x : Integer, y : Integer) : Integer ==
        nx : Integer
        if (> $ Integer)(x, Zero $ Integer)
        then
          y := (- $ Integer)(10)
          nx := baz(x, y)
        else
          nx := (- $ Integer)(x)
        (+ $ Integer)(nx, y)
      baz (x : Integer, y : Integer) : Integer ==
        (* $ Integer)(x, y)

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
boot		boot
common		common
papers		papers
spad		spad
spadil		spadil
.gitignore		.gitignore
.travis.yml		.travis.yml
DaaseDatabase.spad		DaaseDatabase.spad
FoamCompiler.spad		FoamCompiler.spad
Logger.spad		Logger.spad
Makefile		Makefile
ParserTools.spad		ParserTools.spad
Printer.spad		Printer.spad
README.md		README.md
SpadCompilerPasses.spad		SpadCompilerPasses.spad
SpadCompilerTest.spad		SpadCompilerTest.spad
SpadEnvironment.spad		SpadEnvironment.spad
SpadEnvironmentCategory.spad		SpadEnvironmentCategory.spad
SpadLogic.spad		SpadLogic.spad
SpadNode.spad		SpadNode.spad
SpadNodeCollect.spad		SpadNodeCollect.spad
SpadNodeFactory.spad		SpadNodeFactory.spad
SpadNodePredicate.spad		SpadNodePredicate.spad
SpadNodeRewrite.spad		SpadNodeRewrite.spad
SpadNodeTools.spad		SpadNodeTools.spad
SpadNodeWalk.spad		SpadNodeWalk.spad
SpadTypeChecker.spad		SpadTypeChecker.spad
SpadTypeDatabase.spad		SpadTypeDatabase.spad
SpadTypeDefs.inc		SpadTypeDefs.inc
SpadTypeEvaluator.spad		SpadTypeEvaluator.spad
SpadTypeTree.spad		SpadTypeTree.spad
SpadTypeTreeCreator.spad		SpadTypeTreeCreator.spad
SpadTypeUnifier.spad		SpadTypeUnifier.spad
Test1.spad		Test1.spad
Test10.spad		Test10.spad
Test11.spad		Test11.spad
Test12.spad		Test12.spad
Test13.spad		Test13.spad
Test2.spad		Test2.spad
Test3.spad		Test3.spad
Test4.spad		Test4.spad
Test5.spad		Test5.spad
Test6.spad		Test6.spad
Test7.spad		Test7.spad
Test8.spad		Test8.spad
Test9.spad		Test9.spad
axiom-run		axiom-run
compile.input		compile.input
foam.input		foam.input
fricas.patch		fricas.patch
testinit.input		testinit.input

cahirwpz/phd

Folders and files

Latest commit

History

Repository files navigation

PhD thesis code

SPAD language type checker

Objectives

Deliverables

How to compile?

Logging

Components

Interpreting output messages

Source AST in internal representation

Blackboard state before type checker run

Blackboard state after type checker run

Type annotated tree passed to original compiler

About

Topics

Resources

Stars

Watchers

Forks

Languages