Skip to content
Cameron Purdy edited this page Dec 8, 2023 · 15 revisions

Defining "type system"

The Ecstasy type system, language, compiler, and core libraries are designed to work in harmony to securely and effectively deliver the capabilities described in this guide. The term "type system" is often simultaneously vague and overused, so this chapter is intended to bring the concepts down to earth, and to illustrate the purpose of the design, and the value that it delivers to the programmer.

First, in Ecstasy there is actually such a thing as a TypeSystem class, so we're going to quickly stop using that word in the general programming language sense, other than to explain what it means in the generic sense: A language's type system refers to the abstractions and their classifications that a language provides around the raw data that the language machinery manages on behalf of the programmer. In a good design, those abstractions and classifications are used to efficiently avoid or constrain undesirable behavior, and to simplify and streamline the actual desired behavior.

In other words: At some level that it's all ones and zeros, but a type system helps the developer pretend instead that it's actually Int, Boolean, String, Map, and ShoppingCart (etc.) objects, with lots of rules about what is legal or illegal to do with each of them.

So let's get the type system basics out of the way:

  • The Ecstasy type system is static. That simply means that the compiler "statically" (i.e. at compile-time) enforces language type rules. Not all of the rules can be enforced at compile time, but as many as can be enforced at compile time, are enforced at compile time. Some rules that cannot be enforced at compile time are enforced at load-and-link-time instead. And some rules are enforced at runtime, only because they cannot be enforced any earlier.
  • The Ecstasy type system is strong. That simply means that Ecstasy doesn't allow running code to ever mistakenly use an object of one type in a place that requires a different type. For example, assigning an Int value to a String variable will always fail, and that failure is prevented by the compiler as a compile-time error, or if that isn't possible then it is prevented by the linker as a link-time error, or if that isn't possible then it is detected at runtime -- and if it is detected at runtime, the illegal operation is prevented before it can do any harm to our precious ones and zeros!
  • The Ecstasy type system carries complete runtime type information. At runtime, the class of every object is known, and the type of every reference is known. Detailed structural information about the class is also known, and is available via reflection. Furthermore, the compiler embeds specific compile-time type information that cannot otherwise be reconstructed at runtime; for example, when two objects are being compared for equality, the types that those references were "known to be of" by the compiler are included in the compiled form of the code, so that the runtime can make use of that information. Finally, type information is not erased, such as in the case of generic data types; this is referred to as a "reified type system", or as "reified types" -- which literally means "types made real".
  • The Ecstasy type system is based on nominative types. This over-complicated term just means that types can be identified by their names, and their composition is formed by referring to other types by their names. For example, a class of one name may extend another class by specifying the other class' name. (An example from an earlier chapter is when the Point3D class extended the Point class.) To construct an instance of a type, the name of class being instantiated must be specified, such as new Point(0,0). And so on.
  • The Ecstasy type system also supports "duck typing", but only for interfaces. This hilarious term is a play on the saying: "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck." What this means is that if a class provides all of the members of an interface, then the class automatically implements that interface, even if the class does not specify that interface by name in its implements list. This exception to the nominative typing rule is permitted because the class "looks like" that interface and "swims like" that interface and "quacks like" that interface. Duck typing is useful for representing functionality from an existing class as an interface -- especially when you aren't able to change the class to add an explicit implements clause. It turns out that this is quite common in the real world, such as when that class is in a module that is already deployed to production, or when that class is in someone else's module and you want to or need to work with that module -- but you're not able to change their class in their module.
  • Other than the explicit duck typing support for interfaces, Ecstasy does not support structural typing. This just means that two classes are not considered to be of the same type just because they happen to have the same number of properties of the same types, for example.
  • Ecstasy supports a type algebra that includes parameterized types; union, intersection, and difference types; tuple types; explicitly immutable types; access-controlled types; and annotated types.
  • In Ecstasy, "everything is an object", and therefore everything is of an object type. Every bit is an object, of the Bit class. Every integer is an object, of the appropriate Int class. And so on. There is no separate "primitive type system" that the language pre-defines and hard-codes, like in C++, Java, or C#. This Ecstasy type system design is called the "turtles type system", because it's turtles the whole way down; it has no external references or dependencies on types outside of its own object type system. For example, an Int is built on an array of Bit, which in turn is built on an IntLiteral, which in turn is built on a String, which in turn is built on an array of Char, which in turn is built on a UInt32 code point, which in turn is built on an array of Bit, and it just keeps going, recursively, forever.
  • Ecstasy separates types from classes. This is a simple concept in some ways, and a terrifyingly complex concept in other ways. So let's try to keep it simple: The class of an object is what the object actually is. But in Ecstasy, an object can never be directly "touched" or "held" by the programmer; instead, every interaction with an object is via a reference to that object. When we "assign an object to a variable", what we really mean is that we are storing a reference-to-that-object in that variable. When we "set a property to a value", what we are actually doing is storing a reference-to-the-value in that property (and by "value", we mean "some object"). When we "pass an object to a method", we are actually passing a reference-to-the-object to the method. Even from within an object, the object's own interactions with itself are performed via its this reference; an object can't even touch itself! But in each case, it is the reference that has a type; the type is part of the reference, while the class is part of the object. And since this is Ecstasy, everything is an object, including every reference.

Defining a "type"

So now that we have defined what a type system is, or at least what it's supposed to do, we need to define what a type is. This is a bit harder than it sounds, so we're going to approach it from three different directions:

  1. A type is about its composition. Ecstasy types are defined by how they are formed:
  • Nominative. If there is a class called Int, then there is a type called Int. This includes any class, interface, mixin, const, enum, module, package, service, or typedef name.
  • This Type. Within a class, when a method or property declaration refers to its own class name, the implied type is the "auto-narrowing this type", which is a special form of a nominative type. For example, the toUnchecked() method on Int64 returns an @Unchecked Int64, which means "this type, unchecked".
  • Access. With a nominative type, the default type view of the class is the public interface, but it is possible to specify any of four pre-defined types exposed by a class: public, protected, private, and struct. The first three of these are self-explanatory; the struct access refers to the structure type of the class, which provides the underlying passive field storage for each property of the class that may hold a value. For example, the type "(private Person)" refers to the "private view" of the Person class.
  • Immutable. A type can be indicated to be explicitly immutable. For example, the type Map<String, Int> can refer to either a mutable or immutable Map, but the type immutable Map<String, Int> refers explicitly to the immutable form of the class.
  • Annotated. A type annotation in Ecstasy is when a mixin is used to add information to another type. For example, the Unchecked mixin can be applied to an Int64 using an annotation: @Unchecked Int64.
  • Parameterized. A parameterized type is parameterized using other types. For example, the List interface is declared as parameterized by an Element type, so parameterizing the nominative List type with the nominative Int type results in the List<Int> parameterized type.
  • Tuple. A Tuple is a specially parameterized type form, which supports zero or more fields, each of which is identified by a zero-based index and is defined with its own type. For example, a Tuple<Int, String> is a tuple with two fields (also called a pair), with field [0] being an Int and field [1] being a String.
  • Union. A union type is used to refer to one of two different types, as in "type A or type B". For example, String is a const class, and Nullable is an enum class. If a variable can hold either the Null value or a String, we say that the variable type is a union Nullable|String, pronounced as "nullable or string". Because that "Nullable|" prefix is so common, there exists a short-hand that means the same thing: String?.
  • Intersection. An intersection type is used to refer to the combination of two different types, as in "type A plus type B", or "both type A and type B". As mentioned above, List is parameterized by Element. There is also an interface called Freezable; an Element that is also known to be Freezable can be asserted to have the intersection type Element+Freezable; you can see this type at work in the ListFreezer implementation.
  • Difference. A difference type is used to refer to the absence of a type, as in "type A minus type B", or "type A and not type B". For example, the toChecked() method on Int64 returns the (Int64 - Unchecked) type, which means "this type, and not Unchecked".
  1. A type is about its capabilities. Ecstasy types are defined by what they can do:
  • A type contains type members. To simplify slightly, the members are the properties and the methods of the type.
  • If the type of a reference has a readable property (the property, exposed as a Ref), then anyone with that reference can use it to read the value of that property on the object.
  • If the type of a reference has a writable property (the property, exposed as a Var), then anyone with that reference can use it to read and write the value of that property on the object.
  • If the type of a reference has a method, then anyone with that reference can use it to invoke that method on the object.
  • In each of these cases, it doesn't matter if the members are public, protected, or private; these keywords are used for organization, and not for security. The actual type of the reference is what determines what is allowed to be done using the reference.
  1. A type is about constraints and guarantees. Here are the most common examples:
  • If a property or a variable has a certain type, then only references of that type can be stored in that property or variable, and it is guaranteed that any reference obtained from the property or variable will be of that type.
  • If a method or function declares a parameter of a certain type, then only arguments of that type can be passed, and from the method or functions point of view, it is guaranteed that the arguments will be of the defined types.
  • If a method or function declares a return value of a certain type, then only values of that type can be returned by the code in the method or function, and from the caller's point of view, it is guaranteed that any returned values will be of the defined type.
  • If a class defines a type parameter with a type constraint, then the class can only be parameterized by types that match that type constraint. For example, in the definition class NumList<Element extends Number> extends List<Element>, it is guaranteed that the type Element is a Type<Number>, which is an excellent segue to the next topic ...

Defining "is a" and "assignable to"

We've already explained that assigning a Null value to a non-Nullable type is not legal:

String s = Null;     // error: String required; Nullable found

It's also obvious that you can't do things like this:

Int n = "Hello!";    // error: Int required; String found
if (n) {...}         // error: Boolean required; Int found

Yet some similar code would be legal:

// even though the variable is not declared explicitly as being Nullable,
// it is declared as being an Object, and the Null value is an Object
// (because "everything is an Object")
Object o = Null;

// similarly, since Nullable is an enum containing the single value Null,
// the Null value is an Enum value
Enum e = Null;

// the String class doesn't now anything about this interface,
// but the String class does have a "size" property of type Int
interface HasASizeProperty { @RO Int size; }
HasASizeProperty example = "Quack!";

There are two fundamental relationships that define which of the above are legal vs illegal, and why:

  • The is-a relationship defines a relationship between two types, such that if type B is-a type A, then an reference of type B can be used anywhere that a reference to type A is required. The cliché example of an is-a relationship in a an object-oriented language is when type B is a sub-class of type A, but that is just one of many examples of an is-a relationship in Ecstasy.
  • The assignable-to relationship defines a relationship between two types, such that for type B to be assignable-to type A, that either (1) type B is-a type A, or (2) type B has an @Auto conversion method that returns a type that is-a type A. Just like there are no "primitive" hard-wired types in the language, there are also no compiler "hard wired" conversions; the only way that an object can be converted to a different type is for the class of that object to define a method that performs the conversion.

With that information, let's review the above examples to understand why each was either legal or illegal:

  • The assignment String s = Null is illegal, because Null is not a String, and Null has no @Auto conversion method to String.
  • The assignment Int n = "Hello!" is illegal, because String is not an Int, and String has no @Auto conversion method to Int.
  • The statement if (n) {...} is illegal, because Int is not a Boolean, and Int has no @Auto conversion method to Boolean.
  • The assignment Object o = Null is legal, because Null is-a Object.
  • The assignment Enum e = Null is legal, because Null is-a Enum.
  • The assignment HasASizeProperty example = "Hello!" is legal, because the String type is-a HasASizeProperty (by duck typing).
⚠️ Advanced topic

The full definition of the is-a relationship is technically complex, but we can cover 99% of its complexity in a small number of statements; for two types, A and B, type B is-a type A if any of the following are true:

  • A and B are the same exact type;
  • A and B are both class types, and B is a sub-class of A;
  • A is an interface, and B implements or delegates an interface that is-a A;
  • A is an interface, and B duck-types that interface;
  • A is a mixin, and B is annotated by (or incorporates) that mixin;
  • B is a mixin, and it mixes into a type that is-a A
  • A is an union type of two types A1 and A2, and B is-a A1 and/or B is-a A2;
  • A is an intersection type of two types A1 and A2, and B is-a A1 and B is-a A2;
  • A is an explicitly immutable A1, and B is-a A1 and B is implicitly immutable (a const, enum, package, or module);
  • B is an explicitly immutable B1, and B1 is-a A;
  • B specifies a public/protected/private access such as (protected B1) and A does not specify an access, and B1 is-a A;
  • Both A and B specify a public/protected/private access on A1 and B1 respectively, and the level of access specified for B is greater than or equal to the level of access specified for A and B1 is-a A1;
  • B specifies a struct access and A does not specify an access, and the type Struct is-a A;
  • Both A and B specify a struct access on A1 and B1 respectively, and B1 is-a A1;
  • B is a parameterized B1 and A is not parameterized, and B1 is-a A;
  • B is a parameterized B1 and A is a parameterized A1, and B1 is-a A1 and for each parameter Pb of B, either A has no corresponding parameter Pa, or Pa and Pb are the same exact type.

Type Variance

With each of the above rules, it is possible to prove absolutely that some type B is-a type A. There are are also cases in which it is possible to weakly prove that some type B is-a type A at compile time; it is a weak proof because it may actually be proven wrong at runtime. The technical reason why such a condition is permitted to exist is known as type variance; variance occurs when a type is permitted to vary, including (1) in a sub-class, (2) when an interface is implemented, (3) when a type is parameterized. Let's look at an example in which type variance is extremely safe:

interface Lookup {
    Object find(String key);
}

class StringCache implements Lookup {
    @Override
    String find(String key) {...}
}

The interface defined a find() method that returned any Object, but the class (which implemented the interface) explicitly narrowed the return type to String. The simple rule involved here is that variance is generally considered to be safe when narrowing a return type or widening a parameter type; this aligns well with Postel's law: "Be liberal in what you accept, and conservative in what you send." Another common law states: "Contra-variant parameters; covariant returns", which means: When narrowing a type, such as by extending a class or implementing an interface, method parameters can safely widen and return values can safely narrow.

  • Invariance - When a type narrows (e.g. when a class is subclassed) and its method parameters and return types do not change, they are type invariant.
  • Covariance - When a type narrows and its method parameter and/or return types narrow as well, they are type covariant (because they are varying in the same direction).
  • Contra-variance - When a type narrows and its method parameter and/or return types widen, they are type contra-variant (because they are varying in the opposite direction).
⚠️ Advanced topic

Ecstasy allows contra-variant parameters and covariant returns, and this is fully checked at compile time. Variance checks on generic types cannot be fully performed at compile time; generic types are are types that have type parameters. Ecstasy defines two terms that are used for the type variance rules on generic types:

  • Consumes: A type consumes another type T iff it contains a method or property that consumes T. A method consumes T iff (i) it has a parameter of type T; (ii) it has a parameter of a type that produces T; or (iii) it has a return of a type that consumes T. A property consumes T iff (i) the type of the property is T and the property is settable; (ii) the type of the property produces T and the property is settable; or (iii) the type of the property consumes T.
  • Produces: A type produces another type T iff it contains a method or property that produces T. A method produces T iff (i) it has a return of type T; (ii) it has a parameter of a type that consumes T; or (iii) it has a return of a type that produces T. A property produces T iff (i) the type of the property is T; (ii) the type of the property consumes T and the property is settable; or (iii) the type of the property produces T.

Here is an example of a generic type that produces the type parameter Element, but does not consume Element:

interface IndexedExtractor<Element> {
    Element getElement(Int index);
}

Let's consider four different possible assignments of this type, parameterized by two different values of Element:

// these two assignments are obviously correct;
// the left hand side type and right hand side type match
IndexedExtractor<Object> objExtractor1 = new IndexedExtractor<Object>() {...}
IndexedExtractor<String> strExtractor1 = new IndexedExtractor<String>() {...}

// if an extractor produces a String, and a String "is a" Object, then it is
// reasonable to assume that a String extractor "is a" Object extractor, and
// Ecstasy allows this to compile, even though it may need to add run-time
// checks as a result
IndexedExtractor<Object> objExtractor2 = strExtractor1;

// an Object extractor is allowed to produce any Object, while a String
// Extractor can only produce String objects, so an IndexedExtractor<Object>
// does not meet the contract of the IndexedExtractor<String> type, and the
// compiler will reject this assignment as a type error 
IndexedExtractor<String> strExtractor2 = objExtractor1;

In other words, when a type only produces a type parameter, as illustrated in this IndexedExtractor<Element> example, the widening of the type parameter Element results in the widening of the type IndexedExtractor<Element>, and assignment is allowed because IndexedExtractor<String> is-a IndexedExtractor<Object> -- although only weakly so. This is an example of type covariance, since the generic type and its type parameter both widen and narrow together.

Here is an example of a type that consumes the type parameter Element, but does not produce Element:

interface Logger<Element> {
    public void add(Element value);
}

Let's consider four different possible assignments of this type, parameterized by two different values of Element:

// these two assignments are obviously correct;
// the left hand side type and right hand side type match
Logger<Object> objLogger1 = new Logger<Object>() {...}
Logger<String> strLogger1 = new Logger<String>() {...}

// if there is a logger than will only log String values, and we need a
// reference to a logger that will log any Object, then it's obvious that
// a Logger<String> cannot meet the contract of the Logger<Object> type, 
// and the compiler will reject this assignment as a type error  
Logger<Object> objLogger2 = strLogger1;

// if there is a logger that will log any Object, and a String "is a" Object,
// then it is reasonable to assume that a Object logger "is a" String logger,
// and Ecstasy allows this to compile, even though it may need to add run-time
// checks as a result
Logger<String> strLogger2 = objLogger1;

In other words, when a type only consumes a type parameter, as illustrated in this Logger<Element> example, the narrowing of the type parameter Element results in the widening of the type Logger<Element>, and assignment is allowed because Logger<Object> is-a Logger<String> -- although only weakly so. This is an example of type contra-variance, since the generic type widens when its type parameter narrows, and vice-versa.

Not coincidentally, the Ecstasy Array class, which implements the List interface, has a legal is-a relationship (via duck typing) with the two example interfaces above:

Array<String>            strs          = ["hello", "world"];
IndexedExtractor<String> strExtractor3 = strs;
Logger<String>           strLogger3    = strs;

The Array type both consumes and produces the type parameter Element, which affects the "is-a" rules for type variance. Once again, let's consider four different possible assignments of the type, parameterized by two different values of Element:

// these two assignments are obviously correct;
// the left hand side type and right hand side type match
Array<Object> objArray1 = [1, "test", Null];
Array<String> strArray1 = ["hello", "world"];

// this is the big question: is an "Array of String" an "Array of Object"?
// after all, a String is-a Object, and an Array is-a Array, so Ecstasy 
// chooses to allow "Array<String> is-a Array<Object>" to be (weakly) true
Array<Object> objArray2 = strArray1;

// an Object array could contain anything, while a String array can only
// contain String objects, so an Array<Object> does not meet the contract
// of the Array<String> type, and the compiler will reject this assignment
// as a type error
Array<String> strArray2 = objArray1;

Up until this point, all of the "weak" is-a examples of type variance with generic types have been fairly straight-forward from a technical standpoint, but this Array example is different: It's quite simple to abuse this "weak" is-a result by employing the classic cliché "cats and dogs" example:

Cat[]    cats    = [new Cat("Tabby"), new Cat("Russian Blue")];
Animal[] animals = cats;
animals += new Dog("Wolf");   // <- throws an exception!

The exception itself is worth looking at:

Exception: Missing operation "+" on Array<test:Cat>
    at run() (test.x:9)

The exception tells us that the source code was in the aptly-named file "test.x" and that the exception occurred at line 9, where an attempt was made to add a Dog object to an Array<Cat>. It's not a type mismatch exception, though; instead, it's an exception that says "the compiler assumed that there would be an operator on the underlying type that could add a Dog to the array, and the actual type at runtime didn't have any such operator". In other words, the compiler purposefully allowed what it knew to be a potentially incorrect assumption, in order to support covariant generic types.

Many languages, including Java and C#, are purposefully type invariant for generic types; in Java or C#, a List<Cat> cannot be assigned to a List<Animal>. Java supports a limited wild-card syntax, combinable with super and extends keywords, that in practice is nearly unusable. C# does support limited type variance for interfaces by adding in and out keywords to an interface's type parameters that need to be allowed to vary; Scala supports a similar notation using +/-. After carefully evaluating the approaches used in a dozen popular languages, for the design of the Ecstasy language we chose to automatically infer type variance based on the produces/consumes rules, instead of either (i) disallowing variance or (ii) introducing a complicated explicit syntax. The Ecstasy design is based on the real-world utility of type variance to developers, the benefits of automatically inferring valid type variance, and the near-zero incidence of real-world errors caused by type variance -- even in the absence of the additional rule-based compile-time type checking that Ecstasy applies to generic types. And regarding the dogs and cats: No animals were harmed in the making of this language.

The real goal in the Ecstasy design was to somehow automatically "do the right thing", without the developer actually having to stop their progress and switch gears to "cut-and-paste from StackOverflow.com" mode, to look up the complex rules of type variance in the language documentation, or to try random code incantations until something actually compiles and looks-like-it-works without ever understanding the underlying technical concerns.

Prev: Understanding classes Next: Basic building blocks