Skip to content

Type system

Wojciech Biela edited this page Apr 7, 2016 · 16 revisions

(jotting down ideas for formalizing the type system)

Meta-language for describing types

A type expression can be described by the following language:

TypeExpression
 : Constant
 | TypeConstructorInvocation
 | Variable
 | Tuple
   
Constant
 : Integer
 | TypeName

TypeConstructorInvocation
 : TypeName '(' TypeExpression (',' TypeExpression)* ')'

TypeName
  : IDENTIFIER

Variable
  : IDENTIFIER   # TODO: solve ambiguity with TypeName

Tuple
  : '{' (TypeExpression (',' TypeExpression)*)? '}

Some examples are:

  • bigint
  • array(bigint)
  • map(varchar(10), bigint)
  • array(a)
  • ->({array(a), array(a)}, array(a))

The last entry is the type of a function that takes two arrays of some arbitrary type as arguments and returns an array of that same type (e.g., a 2-arg concat function). This form will be necessary for representing the type of lambdas.

A "type constructor invocation" is the instantiation of what we currently call a "parametric type"

TODO

High level todo-list around typesystem. Subject to change:

  • Unification of type parameters (type parameters, literal parameters, type calculation parameters, named type parameters for ROWs etc) - https://github.com/facebook/presto/issues/3938
  • finalize parametrized VARCHAR and DECIMAL pull requests
  • Allow SqlType to only reference types, type variables and constants in TypeSignatureParameter. Currently TypeLiteralCalculation is also allowed. Introduce a TypeConstraint annotation to express the relationship between the arguments of the input types and the output types.
  • (re)design a well defined API to register and query about coercion, type hierarchy and other type metadata
    • Currently type logic is scattered around the system
      • static methods in FunctionRegistry
      • static methods in TypeUtils
    • Some goals:
      • Have a TypeHierarchy object which is passed around the system and exposes type hierarchy information. Each type would register itself in TypeHierarchyBuilder with information about available coercions, type-only coercions etc.
      • Some functionality now exposed as static methods should be exposed in more objective manner as methods of Type/TypeSignature interfaces. E.g. TypeUtils.resolveCalculatedType.
      • Merge Type and ParametricType interfaces
  • Type inference (not bottom-up but "real", e.g. Hindley-Milner algorithm) https://www21.in.tum.de/~nipkow/pubs/aplas11.pdf ("Extending Hindley-Milner Type Inference with Coercive Structural Subtyping")
  • lambdas (based on "real" type inference)
  • extract signature parser as separate entity
    • feed it with the type metadata API handle/object reference
    • Make it ANTLR based
  • rework ROW type syntax
  • support computation of the return type of a vararg function with literal parameters
  • extend supported functions set for DECIMAL
  • add coercion decimal -> double, decimal -> bigint
  • benchmark/improve performance for long DECIMALs (>64bits)
  • make DECIMAL default type for literals (eg. "1.0")
Clone this wiki locally